Amazon's Supply Chain Finance Analytics team is responsible for providing controllership for Amazon's consumer supply chain systems and serves as a strategic partner for the Supply Chain Optimization Technologies (SCOT) team. The team's work is data-intensive, and they have to stitch together hundreds of different tables produced by various business units to create a unified financial view of the results from supply chain systems that are responsible for making science and machine learning driven decisions on tens of billions of dollars in supply chain spend. The team faced significant challenges in streamlining Extract, Transform, & Load (ETL) processes and in reducing their data engineering workloads while also providing reliable, consistent, and accurate insights to internal users with high-quality analytics products.
The Challenge
The SCOT Finance Analytics team at Amazon had to create financial reporting that provided a unified financial view of supply chain systems for decision-making. The team had to deal with enormous volumes of data, with some datasets consisting of billions of rows and hundreds of columns. The insights had to be generated on various dimensions and filters, including date ranges, geographical regions (country, state, city, zip), business categories, and product categories. However, the data source was only performant for 1-3 dimensions of filters, and the BI tools’ extracts usually failed after approximately 100 million rows, degrading analytics products performance severely, especially with high numbers of columns. Queries took very long (four minutes or more for complex queries) to complete, even with sample data. To try to solve this challenge, the team initially created 20+ materialized views of the same table in the data warehouse that were partitioned, sorted, and distributed by different dimensions, and they used parametrization functions in the BI tool to select the right data source based on the filter and value selected by the user. However, this approach was not performant or scalable, and it required significant amounts of ongoing maintenance and management. It also created numerous and complex data pipelines, introducing more chances for human error, and varying system responsiveness based on the Data Warehouse cluster's load. They required a better and more scalable solution to meet their standards.
The Solution
The team researched several commercial and open-source solutions, including Apache Kylin, a commercial big data Online Analytical Processing (OLAP) provider, and Dremio for in-depth evaluation. They had several requirements: it needed to produce views with less than a 10-second refresh time for each user click or filter selection, deliver consistent completion of daily backend data appends, query the entire dataset (with more than three years of historical data), without reducing scope, offer low setup and maintenance labor, and scale compute elastically without bottlenecking resources. Ultimately, the team chose Dremio. They quickly set up a Dremio instance using an Amazon Web Services CloudFormation Template (AWS CFT), and they were able to scale compute up and down as needed. Dremio allowed them to do everything using a modern User Interface for both SQL and no code analytics. Dremio had seamless integrations with existing BI tools and a built-in SQL Runner (SQL IDE) for ad hoc query analysis and exploration. The team was able to set up a reflection build trigger based on new data ingestion into the Amazon Simple Storage Service (Amazon S3) bucket, simplifying their ETL process. Dremio could build multiple combinations of reflections using a GUI interface. After the evaluation process, they chose Dremio over Kylin because it had a faster setup time, it was more user-friendly, and it offered seamless integration with most BI tools.
Conclusion
Amazon's Supply Chain Optimization Technology (SCOT) Finance Analytics team faced challenges in managing their data pipeline while providing reliable, consistent, and accurate insights to internal users with high-quality analytics products. They were able to achieve their goals by using Dremio to accelerate queries, streamline ETL, and reduce their data engineering workloads, getting high-quality insights into the hands of their end users fast. Dremio was able to meet all of their requirements, and they were able to deliver the end result exactly as imagined with no compromise on their vision.
customer stories
Explore how Dremio enables lakehouse analytics in our customer stories
Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.